65 research outputs found
Unsupervised Adaptation of Polyp Segmentation Models via Coarse-to-Fine Self-Supervision
Unsupervised Domain Adaptation~(UDA) has attracted a surge of interest over
the past decade but is difficult to be used in real-world applications.
Considering the privacy-preservation issues and security concerns, in this
work, we study a practical problem of Source-Free Domain Adaptation (SFDA),
which eliminates the reliance on annotated source data. Current SFDA methods
focus on extracting domain knowledge from the source-trained model but neglects
the intrinsic structure of the target domain. Moreover, they typically utilize
pseudo labels for self-training in the target domain, but suffer from the
notorious error accumulation problem. To address these issues, we propose a new
SFDA framework, called Region-to-Pixel Adaptation Network~(RPANet), which
learns the region-level and pixel-level discriminative representations through
coarse-to-fine self-supervision. The proposed RPANet consists of two modules,
Foreground-aware Contrastive Learning (FCL) and Confidence-Calibrated
Pseudo-Labeling (CCPL), which explicitly address the key challenges of ``how to
distinguish'' and ``how to refine''. To be specific, FCL introduces a
supervised contrastive learning paradigm in the region level to contrast
different region centroids across different target images, which efficiently
involves all pseudo labels while robust to noisy samples. CCPL designs a novel
fusion strategy to reduce the overconfidence problem of pseudo labels by fusing
two different target predictions without introducing any additional network
modules. Extensive experiments on three cross-domain polyp segmentation tasks
reveal that RPANet significantly outperforms state-of-the-art SFDA and UDA
methods without access to source data, revealing the potential of SFDA in
medical applications.Comment: Accepted by IPMI 202
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
The increasing capabilities of large language models (LLMs) raise
opportunities for artificial general intelligence but concurrently amplify
safety concerns, such as potential misuse of AI systems, necessitating
effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has
emerged as a promising pathway towards AI alignment but brings forth challenges
due to its complexity and dependence on a separate reward model. Direct
Preference Optimization (DPO) has been proposed as an alternative, and it
remains equivalent to RLHF under the reverse KL regularization constraint. This
paper presents -DPO, a generalized approach to DPO by incorporating diverse
divergence constraints. We show that under certain -divergences, including
Jensen-Shannon divergence, forward KL divergences and -divergences, the
complex relationship between the reward and optimal policy can also be
simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the
need for estimating the normalizing constant in the Bradley-Terry model and
enables a tractable mapping between the reward function and the optimal policy.
Our approach optimizes LLMs to align with human preferences in a more efficient
and supervised manner under a broad set of divergence constraints. Empirically,
adopting these divergences ensures a balance between alignment performance and
generation diversity. Importantly, -DPO outperforms PPO-based methods in
divergence efficiency, and divergence constraints directly influence expected
calibration error (ECE).Comment: Preprin
Effect of composting and soil type on dissipation of veterinary antibiotics in land-applied manures
The objective of this study was to determine the fate of commonly used veterinary antibiotics in their naturally excreted form when manure-based amendments are applied to soil. Beef cattle were administered sulfamethazine, tylosin, and chlortetracycline and dairy cows were treated with pirlimycin according to standard animal production practice. The resulting manure was composted for 42 days under static or turned conditions and applied at agronomic N rates to sandy, silt, and silty clay loam soils and compared with amendment with corresponding raw manures in sacrificial microcosms over a 120-day period. Antibiotic dissipation in the raw manure-amended soils followed bi-phasic first order kinetics. The first phase half-lives for sulfamethazine, tylosin, chlortetracycline, and pirlimycin ranged from 6.0 to 18 days, 2.7 to 3.7 days, 23 to 25 days, and 5.5 to 8.2 days, respectively. During the second phase, dissipation of sulfamethazine was negligible, while the half-lives for tylosin, chlortetracycline, and pirlimycin ranged from 41 to 44 days, 75 to 144 days, and 87 to 142 days, respectively. By contrast, antibiotic dissipation in the compost-amended soils followed single-phase first order kinetics with negligible dissipation of sulfamethazine and half-lives of tylosin and chlortetracycline ranging from 15 to 16 days and 49 to 104 days, respectively. Pirlimycin was below the detection limit in the compost-amended soils. After incubating 120-days, antibiotics in compost-amended soils (up to 3.1 ug/kg) were significantly lower than in the manure-amended soils (up to 19 ug/kg; p<0.0001), with no major effect of soil type on the dissipation. Risk assessment suggested that manure composting can reduce antibiotic resistance selection potential in manure-amended soils
MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding
Online real-time bidding (RTB) is known as a complex auction game where ad
platforms seek to consider various influential key performance indicators
(KPIs), like revenue and return on investment (ROI). The trade-off among these
competing goals needs to be balanced on a massive scale. To address the
problem, we propose a multi-objective reinforcement learning algorithm, named
MoTiAC, for the problem of bidding optimization with various goals.
Specifically, in MoTiAC, instead of using a fixed and linear combination of
multiple objectives, we compute adaptive weights overtime on the basis of how
well the current state agrees with the agent's prior. In addition, we provide
interesting properties of model updating and further prove that Pareto
optimality could be guaranteed. We demonstrate the effectiveness of our method
on a real-world commercial dataset. Experiments show that the model outperforms
all state-of-the-art baselines.Comment: 8 Pages, Extensive Experiment
A Survey on Knowledge-Enhanced Pre-trained Language Models
Natural Language Processing (NLP) has been revolutionized by the use of
Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in
nearly every NLP task, PLMs still face a number of challenges including poor
interpretability, weak reasoning capability, and the need for a lot of
expensive annotated data when applied to downstream tasks. By integrating
external knowledge into PLMs,
\textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained
\underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to
overcome the above-mentioned limitations. In this paper, we examine KEPLMs
systematically through a series of studies. Specifically, we outline the common
types and different formats of knowledge to be integrated into KEPLMs, detail
the existing methods for building and evaluating KEPLMS, present the
applications of KEPLMs in downstream tasks, and discuss the future research
directions. Researchers will benefit from this survey by gaining a quick and
comprehensive overview of the latest developments in this field.Comment: 19 pages, 12 figures, 192 reference
Active Policy Improvement from Multiple Black-box Oracles
Reinforcement learning (RL) has made significant strides in various complex
domains. However, identifying an effective policy via RL often necessitates
extensive exploration. Imitation learning aims to mitigate this issue by using
expert demonstrations to guide exploration. In real-world scenarios, one often
has access to multiple suboptimal black-box experts, rather than a single
optimal oracle. These experts do not universally outperform each other across
all states, presenting a challenge in actively deciding which oracle to use and
in which state. We introduce MAPS and MAPS-SE, a class of policy improvement
algorithms that perform imitation learning from multiple suboptimal oracles. In
particular, MAPS actively selects which of the oracles to imitate and improve
their value function estimates, and MAPS-SE additionally leverages an active
state exploration criterion to determine which states one should explore. We
provide a comprehensive theoretical analysis and demonstrate that MAPS and
MAPS-SE enjoy sample efficiency advantage over the state-of-the-art policy
improvement algorithms. Empirical results show that MAPS-SE significantly
accelerates policy optimization via state-wise imitation learning from multiple
oracles across a broad spectrum of control tasks in the DeepMind Control Suite.
Our code is publicly available at: https://github.com/ripl/maps
- …